Pathway and Design
1 Project Background
1.1 DNA Computer & Electronic Computer
DNA algorithm makes the use of DNA molecular structure and the principle of complementary base pairing to code information and algorithm, so that issues to be managed can be mapped as a specific DNA molecular segment. Then solutions could be available through the controllable biochemical reaction. The final operation result could be tested and obtained through various of modern molecular biological techniques such as Polymerase Chain Reaction (PCR), technology of Overlopping amplification polymerization (POA), Ultrasonic Degradation ,Affinity Chromatography, Molecular Purification, Cloning, Mutagenesis, Electrophoresis, Mseparation and Sequencing.
The research shows that, as a new kind of molecular computing model, DNA calculation has some advantages that the electronic computer can not match.
Taking the Hamiltonian path as an example, the energy consumption of the DNA computer is much less than that of the electronic computer.
At the same time, the computation time is also far less than the electronic computer. When there is no exact algorithm to solve the NP problem, we can only use exhaustive method.Thus,DNA computer has more advantages. The following is a visual map.
a.Schematic diagram of energy consumption
b.Schematic diagram of energy consumption
1.2 Project Description in 2014
In the project of AHUT_China in 2014, we built the shortest path calculation model based on DNA calculation , and illustrated the principle of calculation through a planning case of navigation pathway.
The figure below shows the path from node No.2 of the hotel to node No.8 of the Convention Centre with node No.5 of the museum being a necessary one. Among them, the museum is located in the 5 node as a necessary node,Which path covers the least nodes?
First, we transfer all points and paths on the map into a DNA single chain with different sequence and its length is 40bp. Meanwhile, we set Site No.2 as a starting point, Site No.8 as a terminal point and Site No.5 must be passed. Then, the specific enzyme cut Site No.5 was established in the DNA sequence of these points respectively. And we put the DNA fragments, which were processed, in PCR and they can connect freely. Subsequently, amplifying all the results, we can obtain appropriate length of the DNA sequence by gel electrophoresis preliminary screening. Next, we use enzyme cut Site No.5 of Site No.5 to insert GFP, and we screen out the path which include Site No.5. Then screen the paths which have Site No.2 and Site No.8 through the TetR and LacI, which can control the GFP from Site No.2 and Site No.8. And we can finish the specific detection of DNA sequences which are inserted in the plasmid. At the same time, we can obtain the plasmids which include the sequence information of Site No.2, 5 and 8. Finally, the optimal path planning scheme is 2-4-5-7-8 which is determined by sequencing and comparing with the preset of the sequence information of each site.
1.3 Database Comparison
First, we transfer all points and paths on the map into a DNA single chain with different sequence and its length is 40bp. Meanwhile, we set Site No.2 as a starting point, Site No.8 as a terminal point and Site No.5 must be passed. Then, the specific enzyme cut Site No.5 was established in the DNA sequence of these points respectively. And we put the DNA fragments, which were processed, in PCR and they can connect freely. Subsequently, amplifying all the results, we can obtain appropriate length of the DNA sequence by gel electrophoresis preliminary screening. Next, we use enzyme cut Site No.5 of Site No.5 to insert GFP, and we screen out the path which include Site No.5. Then screen the paths which have Site No.2 and Site No.8 through the TetR and LacI, which can control the GFP from Site No.2 and Site No.8. And we can finish the specific detection of DNA sequences which are inserted in the plasmid. At the same time, we can obtain the plasmids which include the sequence information of Site No.2, 5 and 8. Finally, the optimal path planning scheme is 2-4-5-7-8 which is determined by sequencing and comparing with the preset of the sequence information of each site.
Compared to the project in 2014, we make a lot of improvement on project this year.It mainly concludes four aspects.
Improvement on special information site
There are no special information sites in the project of 2014. In this year's project, we found this defect.Then we added 12 information sites on the line of the DNA sequence. They are: recognition sites, sequence numbers and length of sequence .At the same time,they are expressed by using 4 base sequence respectively.
Change in connection mode
The way of DNA connection is by means of the polymerization reaction of DNA in the project of 2014. There is no need to fill the gap of DNA double chain. While in this year’s project,we added 12 information sites on the line of the DNA sequence.Thus, after the polymerization of DNA, there will be a gap of DNA double chain and the free base are necessary to be used to complete it.
Expanded storage in information
DNA double chain formed by Eight sites and twelve lines after polymerization reaction without carrying information. While in 13 lines of 2016 project we added 12 base sequences as information sites which is used to store path information. If the amount of information is large, we can increase the number of bases, which can store more information.Compared to the project of 2014, we can save a lot of information on the line of the project this year. Path selection
In the aspect of path selection, we can obtain plasmid concluding three nodes(node 2, node 5 and node 8) by detecting fluorescent protein .Then by sequencing and comparing the sequence information of each node, the optimal path planning scheme can be determined. In 2016, the project is to find the optimal solution of the least node and the shortest path coupling by sequencing and four binary operation of the information site. And because the project of 2014 needs to import a variety of genes in order to determine whether the path through the node 2, node 5 and node 8. Relatively speaking , this year’s project directly used DNA sequencing is more simple in terms of path selection .
In real life, one precondition of solving the shortest path problem is to obtain length information of every pathway. The twelve sites added to the pathway sequence this year can effectively translate problem of the AHUT_China 2014 project which obtained the minimum node into the problem of obtaining the optimal solution of coupling minimum node and the shortest distance. Its practicality has been greatly improved.
1.4 Project Ideas
We have furthered study based on the path planning problem to be solved in 2014. First,under the support of the new standard biological navigation information database,we will design specific primers for the 2 and 8 nodes.Second make all the DNA which carried nodes and path information randomly connected.Consequently,all possible path code.
combinations will be got. Third, the path of the least nodes will be selected by gel electrophoresis. Finally, the optimal solution of the least nodes and shortest path coupling will be found by sequencing and four binary operation of the information site.
2 Design
2.1 Theory
2.1.1Abstract
We expect to build a new bio-navigational system that enables a faster optimal pathway scheme and a more convenient calculation process under this big data era to break the limitation of computational capacity. The system is able to judge rapidly and automatically according to the information of pathways and sites, so as to provide a reliable optimal pathway scheme for the users. In this project, we analyze all possible combinations of pathways and sites, taking the site No.2 as the starting point and the site No.8 as the ending point.
2.1.2 Theory Preparation
Quaternary: In this project, four quaternary numbers corresponded to four respective bases are used (i.e. A=0, T=1, C=2, G=3) to represent site and pathway information. Range of four quaternary numbers:
The transition from decimal to base sequence is as follows:
2.1.3 Design of Sites and Lines
Quaternary: In this project, four quaternary numbers corresponded to four respective bases are used (i.e. A=0, T=1, C=2, G=3) to represent site and pathway information. Range of four quaternary numbers:
Design of Sites
a. The base sequence of 40bp is used to represent information of every site.
E.g:
Single strand DNA sequence of Site No.2:
GTAATGATCTCCTAGGAGATACATTCGATCGATCATGCTA
Single strand DNA sequence of Site No.8:
GTACAGTACGGTACCGTACCCGTGACGTACGTGATGACTG
b. Primer Design Description
All the combinations are started from Site No.2 and ended by Site No.8. Primer is designed according to the model of Site No.2 and Site No.8 so that all the combinations can be massively proliferated in the PCR. The restriction enzyme cutting site is designed on the primer which enables us to guide the target gene into the Plasmid pSB1C3.
Design of Lines
Lines are represented by 52bpDNA single strand, whose effective information is 12bp base sequence. Respectively, pre-4bp sequence is GGGG’s identification code; mid-4bp sequence indicates the number of this part of path; post-4bp sequence indicates the length of path.
a. Identification Code Sequence
GGGG sequence is used as identification code in this project’s DNA single strand, for the DNA sequence with successive GGGG sequence is rare in nature making it quite distinguished from other DNA sequences in organisms and thus easy to be identified.
b. Site Number Sequence
Four base sequences are used to represent the number of the path, with the digital combination of “start site+ end site”, the numbers are still represented by the Quaternary with four representative of AGCT, and the starting site and ending site are represented by two bases respectively. For example, from Site No.2 to Site No.3, the path is 0203 in Decimalism while ACAT in base sequence.
c. Path Length Sequence
A quaternary number consisting of four bases is used to represent path length.
To sum up, take Path <2,4> and <2,6> as examples,
<2,4>=8mile
TGTAAGCTAGCTAGTACGATGGGGACGAAACATCCTACTTCGGTCGACCTCG
<2,6>=9mile
TGTAAGCTAGCTAGTACGATGGGGACGCAACGAATGCTCGAGCGCTATGGAAC
As mentioned above, in the red DNA sequence of Line<2,4> and <2,6>, the pre-four sequences contain the information of identification point, the mid-four sequences contain the site information of directed path, and the post-four sequences contain the information of path length.
2.1.4 Bridging of Sites and Lines
Random combination of sites and lines can get all possible path site combination from Site No.2 to Site No.8.
2.1.5 All Possible Answers
List all possible answers with the method of exhaustion.
<2-4-8>
<2-6-8>
<2-1-4-8>
<2-3-6-8>
<2-4-6-8>
<2-6-4-8>
<2-1-4-6-8>
<2-3-6-4-8>
<2-1-4-5-7-8>
<2-6-4-5-7-8>
<2-3-6-4-5-7-8>
2.2 Experimental procedure
1. Dilute all sites and lines to 50μmol/L. (2016.6.2)
2. DNA polymerization reaction, namely, the sites and lines were connected by complementary base pairing, and then filled the rest single fragment area. (2016.7.14)
3. DNA ligation reaction, that is, the "gap" on the double-stranded DNA was connected with phosphodiester bond catalyzed by T4 DNA ligase. (2016.7.14)
4. PCR amplification, that is, different combinations of DNA were amplified by PCR using sites 2 and 8-specific primers .(2016.7.15)
5.DNA gel electrophoresis. (2016.7.15)
6. Find out the corresponding length of DNA, gel extraction. (2016.7.19)
7.The extracted DNA fragments were digested by Pst1 and EcoR1. (2016.7.23)
8. The same enzyme digestion was carried out on the plasmids. (2016.7.23)
9. After the enzyme digestion, the DNA fragments were inserted into the digested plasmids pSB1C3. (2016.7.24)
10. Transformation, that is, the connected plasmids were transformed into E.coli.(2016.7.24)
11. Screening. In the medium containing chloramphenicol, screen out of the clones that we need. (2016.7.25)
12. The screened clones were amplified in liquid medium, and then extracted the plasmids in E.coli. (2016.7.25)
13. Identify the plasmids by PCR and enzyme digestion. (2016.7.26)
14. Sequencing (send into the company for sequencing). (2016.7.31)
2.3 Conclusion
Based on our project of AHUT-China 2014, further improvement has been made this year through inheritance and innovation. In this project, we have made a new design for line to make its effective information more applicable to complex practical path problems; in addition, we have greatly simplified the experiment procedure, increasing efficiency as well as reducing cost, to take full advantage of dig data processing of bio-computer and make the first step in exploring commercial production and usage.
Reference
[1]ZHANG Xuncai,XI Fang. Research Advances and Prospect of Microfluidic DNA Computing [J].Computer Engineering and Applications,2011,47 (32):37-41.
[2]Chen Xiaoyan, Jiang Long. DNA Computer [J].Progress in Chemistry, February, 1999, 11 (1)
[3]Shao Xueguang, Jiang Haiyan. Development of Biomolecular Computing [J].Progress in Chemistry, January, 2002, 14 (1)
[4]Yin Yimei, Lin Xiangqin. Research Progress of Molecular Computer [J].Progress in Chemistry, September, 2001, 13 (5)
[5]Zhang Runze, Xu Hao, Xu Xiangrong. Research Based on DNA Calculating and the Shortest Path of Standard Biobrick Calculating [J]